Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Code completion problems are an effective type of formative assessment; especially, when used to practice newly learned concepts or topics. While there is a growing body of research in computing education on the use of large language models (LLMs) to support learning content development, the use of LLMs for producing high-quality code completion problems has not yet been explored. In this paper, we analyze the capability of LLMs to generate effective distractors (i.e., plausible but incorrect options) and explanations for completion problems. We utilize common student misconceptions to improve the quality of the generated distractors. Our study suggests that LLMs are capable of generating reasonable distractors and explanations. At the same time, we identify a lack of a sufficiently granular taxonomy of common student misconceptions that would be needed for aligning the generated distractors with the common misconceptions and errors -- a gap that should be addressed in future work.more » « lessFree, publicly-accessible full text available May 14, 2026
-
As large language models (LLMs) show great promise in generating a wide spectrum of educational materials, robust yet cost-effective assessment of the quality and effectiveness of such materials becomes an important challenge. Traditional approaches, including expert-based quality assessment and student-centered evaluation, are resource-consuming, and do not scale efficiently. In this work, we explored the use of pre-existing student learning data as a promising approach to evaluate LLM-generated learning materials. Specifically, we used a dataset where students were completing the program construction challenges by picking the correct answers among human-authored distractors to evaluate the quality of LLM-generated distractors for the same challenges. The dataset included responses from 1,071 students across 22 classes taught from Fall 2017 to Spring 2023. We evaluated five prominent LLMs (OpenAI-o1, GPT-4, GPT-4o, GPT-4o-mini, and Llama-3.1-8b) across three different prompts to see which combinations result in more effective distractors, i.e., those that are plausible (often picked by students), and potentially based on common misconceptions. Our results suggest that GPT-4o was the most effective model, matching close to 50% of the functional distractors originally authored by humans. At the same time, all of the evaluated LLMs generated many novel distractors, i.e., those that did not match the pre-existing human-authored ones. Our preliminary analysis shows that those appear to be promising. Establishing their effectiveness in real-world classroom settings is left for future work.more » « lessFree, publicly-accessible full text available March 3, 2026
-
Worked code examples are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide line-by-line explanations of a large number of examples typically used in a programming class. This paper explores the opportunity to facilitate the development of worked examples for Java programming through a human-AI collaborative authoring approach. The idea of collaborative authoring is to generate a starting version of code explanations using LLM and present it to the instructor to edit if necessary. The critical step towards implementing this idea is to ensure that LLM can produce code explanations that look meaningful and acceptable to instructors and students. To achieve this goal, we performed an extensive prompt engineering study and evaluated the explanation produced by the selected prompt in a user study with students and authors.more » « less
-
Worked examples are among the most popular types of learning content in programming classes. However, instructors rarely have time to provide line-by-line explanations for a large number of examples typically used in a programming class. In this paper, we explore and assess a human-AI collaboration approach to authoring worked examples for Java programming. We introduce an authoring system for creating Java worked examples that generate a starting version of code explanations and presents it to the instructor to edit if necessary. We also present a study that assesses the quality of explanations created with this approach.more » « less
-
Worked examples (solutions to typical programming problems presented as a source code in a certain language and are used to explain the topics from a programming class) are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide line-by-line explanations for a large number of examples typically used in a programming class. In this paper, we explore and assess a human-AI collaboration approach to authoring worked examples for Java programming. We introduce an authoring system for creating Java worked examples that generates a starting version of code explanations and presents it to the instructor to edit if necessary. We also present a study that assesses the quality of explanations created with this approach.more » « less
-
Worked examples, which present an explained code for solving typical programming problems are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide explanations for many examples typically used in a programming class. In this paper, we assess the feasibility of using LLMs to generate code explanations for passive and active example exploration systems. To achieve this goal, we compare the code explanations generated by chatGPT with the explanations generated by both experts and students.more » « less
An official website of the United States government

Full Text Available